Concurrent checkpoint initiation and recovery algorithms on asynchronous ring network

نویسندگان

  • Partha Sarathi Mandal
  • Krishnendu Mukhopadhyaya
چکیده

Checkpointing with rollback recovery is a well-known method for achieving fault-tolerance in distributed systems. In this work, we introduce algorithms for checkpointing and rollback recovery on asynchronous unidirectional and bi-directional ring networks. The proposed checkpointing algorithms can handle multiple concurrent initiations by different processes. While taking checkpoints, processes do not have to take into consideration any application message dependency. The synchronization is achieved by passing control messages among the processes. Application messages are acknowledged. Each process maintains a list of unacknowledged messages. Here we use a logical checkpoint, which is a standard checkpoint (i.e., snapshot of the process) plus a list of messages that have been sent by this process but are unacknowledged at the time of taking the checkpoint. The worst case message complexity of the proposed checkpointing algorithm is OðknÞ when k initiators initiate concurrently. The time complexity is OðnÞ: For the recovery algorithm, time and message complexities are both OðnÞ: r 2004 Elsevier Inc. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DALD:-Distributed-Asynchronous-Local-Decontamination Algorithm in Arbitrary Graphs

Network environments always can be invaded by intruder agents. In networks where nodes are performing some computations, intruder agents might contaminate some nodes. Therefore, problem of decontaminating a network infected by intruder agents is one of the major problems in these networks. In this paper, we present a distributed asynchronous local algorithm for decontaminating a network. In mos...

متن کامل

New Causal Message Logging Protocol with Asynchronous Checkpointing for Distributed Systems

Causal message logging is an efficient approach for tolerating failures of processes in distributed systems because it has the advantages of both pessimistic and optimistic message logging approach. However, traditional causal message logging protocols prevent live processes from executing continuously their computation and require some synchronous logging to the stable storage during recovery....

متن کامل

Defective in mitotic arrest 1/ring finger 8 is a checkpoint protein that antagonizes the human mitotic exit network.

A molecular pathway homologous to the S. cerevisiae mitotic exit network (MEN) and S. pombe septation initiation network has recently been described in higher eukaryotes and involves the tumor suppressor kinase LATS1 and its subunit MOB1A. The yeast MEN/septation initiation network pathways are regulated by the ubiquitin ligase defective in mitotic arrest 1 (Dma1p), a checkpoint protein that he...

متن کامل

Failure Recovery based on Quasi-Synchronous Checkpointing in Mobile Computing Systems

Mobile computing systems are expected to revolutionize the way computers are used. Mobile hosts have small memory, a relatively slow processor and low power batteries, and communicate over low bandwidth wireless communication links. In this paper, we address the problem of failure recovery in mobile computing systems. Any recovery method for mobile computing systems should take into considerati...

متن کامل

Checkpointing and Recovery Algorithms Using Mobile Agents on a Hamiltonian Topology

Traditional message passing based checkpointing and rollback recovery algorithms perform well for closely coupled systems. In wide area distributed systems these algorithms may incur large overhead due to message passing delay and network traffic. So to design checkpointing and rollback recovery algorithms for wide area distributed systems, mobile agents are introduced. Network topology is assu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 64  شماره 

صفحات  -

تاریخ انتشار 2004